{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ur8xi4C7S06n"
   },
   "outputs": [],
   "source": [
    "# Copyright 2023 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "JAPoU8Sm5E6e"
   },
   "source": [
    "# Question Answering with Generative Models on Vertex AI\n",
    "\n",
    "\n",
    "<table align=\"left\">\n",
    "\n",
    "  <td>\n",
    "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/language/examples/prompt-design/question_answering.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Colab logo\"> Run in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/examples/prompt-design/question_answering.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\">\n",
    "      View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/blob/main/language/examples/prompt-design/question_answering.ipynb\">\n",
    "      <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\">\n",
    "      Open in Vertex AI Workbench\n",
    "    </a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tvgnzT1CKxrO"
   },
   "source": [
    "## Overview\n",
    "\n",
    "Large language models can be used for various natural language processing tasks, including question-answering (Q&A). These models are trained on a vast amount text data and can generate high-quality responses to a wide range of questions. One thing to note here is that most models have cutoff dates regarding their knowledge, and asking anything too recent might yield an incomplete, imaginative or incorrect answer (i.e. a hallucination).\n",
    "\n",
    "This notebook covers the essentials of prompts for answering questions using a generative model. In addition, it showcases the `open domain` (knowledge available on the public internet) and `closed domain` (knowledge that is more private - typically enterprise or personal knowledge).\n",
    "\n",
    "Learn more about prompt design in the [official documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/text/text-overview#prompt_structure)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d975e698c9a4"
   },
   "source": [
    "### Objective\n",
    "\n",
    "By the end of the notebook, you should be able to write prompts for the following:\n",
    "\n",
    "* **Open domain** questions:\n",
    "    * Zero-shot prompting\n",
    "    * Few-shot prompting\n",
    "\n",
    "\n",
    "* **Closed domain** questions:\n",
    "    * Providing custom knowledge as context\n",
    "    * Instruction-tune the outputs\n",
    "    * Few-shot prompting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QDU0XJ1xRDlL"
   },
   "source": [
    "## Getting Started"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2a5AEr0lkLKD"
   },
   "source": [
    "### Install Vertex AI SDK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "82ad0c445061"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: google-cloud-aiplatform in /opt/conda/lib/python3.7/site-packages (1.29.0)\n",
      "Collecting google-cloud-aiplatform\n",
      "  Obtaining dependency information for google-cloud-aiplatform from https://files.pythonhosted.org/packages/5e/c9/bc727aa6d015128a728eb9fda4378b5493f5131c92b7e970bbf5f4c3eba8/google_cloud_aiplatform-1.31.0-py2.py3-none-any.whl.metadata\n",
      "  Downloading google_cloud_aiplatform-1.31.0-py2.py3-none-any.whl.metadata (25 kB)\n",
      "Requirement already satisfied: google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (1.34.0)\n",
      "Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (1.22.3)\n",
      "Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (3.20.3)\n",
      "Requirement already satisfied: packaging>=14.3 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (23.1)\n",
      "Requirement already satisfied: google-cloud-storage<3.0.0dev,>=1.32.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (2.10.0)\n",
      "Requirement already satisfied: google-cloud-bigquery<4.0.0dev,>=1.15.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (3.11.4)\n",
      "Requirement already satisfied: google-cloud-resource-manager<3.0.0dev,>=1.3.3 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (1.10.3)\n",
      "Requirement already satisfied: shapely<2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (1.8.5.post1)\n",
      "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.60.0)\n",
      "Requirement already satisfied: google-auth<3.0dev,>=1.25.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2.22.0)\n",
      "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2.31.0)\n",
      "Requirement already satisfied: grpcio<2.0dev,>=1.33.2 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.56.2)\n",
      "Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.48.2)\n",
      "Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.6.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.3.3)\n",
      "Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.5.0)\n",
      "Requirement already satisfied: python-dateutil<3.0dev,>=2.7.2 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.8.2)\n",
      "Requirement already satisfied: grpc-google-iam-v1<1.0.0dev,>=0.12.4 in /opt/conda/lib/python3.7/site-packages (from google-cloud-resource-manager<3.0.0dev,>=1.3.3->google-cloud-aiplatform) (0.12.6)\n",
      "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (5.3.1)\n",
      "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (0.3.0)\n",
      "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (4.9)\n",
      "Requirement already satisfied: six>=1.9.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.16.0)\n",
      "Requirement already satisfied: urllib3<2.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.26.16)\n",
      "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /opt/conda/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.5.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (3.2.0)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (3.4)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2023.7.22)\n",
      "Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (0.5.0)\n",
      "Downloading google_cloud_aiplatform-1.31.0-py2.py3-none-any.whl (2.8 MB)\n",
      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.8/2.8 MB\u001b[0m \u001b[31m36.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
      "\u001b[?25hInstalling collected packages: google-cloud-aiplatform\n",
      "\u001b[33m  WARNING: The script tb-gcp-uploader is installed in '/home/jupyter/.local/bin' which is not on PATH.\n",
      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\n",
      "\u001b[0mSuccessfully installed google-cloud-aiplatform-1.31.0\n"
     ]
    }
   ],
   "source": [
    "!pip install google-cloud-aiplatform --upgrade --user"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install additional Python Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q python-Levenshtein --upgrade --user\n",
    "!pip install -q fuzzywuzzy --upgrade --user"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following cell restarts the notebook kernel. For Vertex AI Workbench you can restart from the terminal using the kernel status button on top. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "_Hsqwn4hkLKE"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'status': 'ok', 'restart': True}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Automatically restart kernel after package installs so that your environment can access the new packages\n",
    "import IPython\n",
    "\n",
    "app = IPython.Application.instance()\n",
    "app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "960505627ddf"
   },
   "source": [
    "### Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "PyQmSRbKA8r-"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from vertexai.language_models import TextGenerationModel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UP76a2la7O-a"
   },
   "source": [
    "### Import models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "7isig7e07O-a"
   },
   "outputs": [],
   "source": [
    "generation_model = TextGenerationModel.from_pretrained(\"text-bison@001\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fIPcn5dZ7O-b"
   },
   "source": [
    "## Question Answering"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cNNEz7vGFYUP"
   },
   "source": [
    "Question-answering capabilities require providing a prompt or a question that the model can use to generate a response. The prompt can be a few words or a few complete sentences, depending on the complexity of the question.\n",
    "\n",
    "When creating a question-answering prompt, it is essential to be specific and provide as much context as possible. It helps the model understand the intent behind the question and generate a relevant response. For example, if you want to ask:\n",
    "\n",
    "```\n",
    "\"What is the capital of France?\",\n",
    "\n",
    "then a good prompt could be:\n",
    "\n",
    "\"Please tell me the name of the city that serves as the capital of France.\"\n",
    "\n",
    "```\n",
    "\n",
    "In addition to being specific, the prompt should also be grammatically correct and free of spelling errors. It helps the model generate a response that is easy to understand and contains fewer errors or inaccuracies.\n",
    "\n",
    "By providing specific, context-rich prompts, you can help the model understand the intent behind the question and generate accurate and relevant responses.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C5N9ZnlECm-z"
   },
   "source": [
    "Below are some differences between the **open domain** and **closed domain** categories for question-answering prompts.\n",
    "\n",
    "* **Open domain**: All questions whose answers are available online already. They can belong to any category, like history, geography, countries, politics, chemistry, etc. These include trivia or general knowledge questions, like:\n",
    "\n",
    "```\n",
    "Q: Who won the Olympic gold in swimming?\n",
    "Q: Who is the President of [given country]?\n",
    "Q: Who wrote [specific book]\"?\n",
    "```\n",
    "\n",
    "Keep in mind the training cutoff of generative models, as questions involving information more recent than what the model was trained on might give incorrect or imaginative answers.\n",
    "\n",
    "\n",
    "* **Closed domain**: If you have some internal knowledge base not available on the public internet, then those belong to the _closed domain_ category.\n",
    "You can pass that \"private\" knowledge as context to the model. If prompted correctly, the model is more likely to answer from within the context provided and less likely to give answers beyond that from the open internet.\n",
    "\n",
    "Consider the example of building a Q&A bot over your internal product documentation. In this case, you can pass the complete documentation to the model and prompt it only to answer based on that.\n",
    "\n",
    "Typical prompt for **closed domain**:\n",
    "\n",
    "```\n",
    "Prompt: f\"\"\" Answer from the below context: \\n\\n\n",
    "\t\t   context: {your knowledge base} \\n\n",
    "\t\t   question: {question specific to that knowledge base}  \\n\n",
    "\t\t   answer: {to be predicted by model} \\n\n",
    "\t\t\"\"\"\n",
    "```\n",
    "\n",
    "Below are some examples to understand these different types of prompts."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WBoN6zixDSiX"
   },
   "source": [
    "### Open Domain"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wJnv8XhnDXQm"
   },
   "source": [
    "#### Zero-shot prompting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "PaYoQuRwCm-z"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#BeachLifestyleDreamHome\n"
     ]
    }
   ],
   "source": [
    "prompt = \"\"\"Create a catchy hashtag based on this house description: \"Coastal Elegance Awaits! Immerse yourself in beach living with this \n",
    "stunning 4-bedroom, 3-bathroom home. Recently renovated with over $200,000 in upgrades, this move-in ready haven offers the perfect blend of \n",
    "style and functionality. Step inside to discover a beautifully updated kitchen featuring quartz counters, stainless steel appliances, and \n",
    "a vent hood. The spacious dining area leads to a cozy formal living room with a wood-burning fireplace, creating a welcoming space for \n",
    "gatherings. All three bathrooms have been completely remodeled, showcasing new vanities, quartz counters, and elegant tubs/showers with glass \n",
    "enclosures. The master bedroom addition provides an expansive retreat with a walk-in closet, while a second master bedroom at the front offers \n",
    "versatility. This delightful home boasts bonus features like all-new LVT flooring, smooth ceilings, recessed lighting, dual-pane windows, and \n",
    "a Nest thermostat for modern comfort. Outside, the private rear yard is bordered by block wall fencing, providing a serene space to relax or \n",
    "entertain. A large side yard offers additional opportunities for outdoor enjoyment. Situated in a quiet neighborhood, the property is within\n",
    "walking distance to award-winning schools, a park, golf, shopping, and restaurants. Plus, you're only a 5-minute drive from the beach and the \n",
    "vibrant Lovely Mall. Don't miss out on this coastal gem with luxurious upgrades and a convenient location. Schedule a showing today and \n",
    "experience the beach lifestyle you've been dreaming of!\n",
    "\"\"\"\n",
    "\n",
    "print(\n",
    "    generation_model.predict(\n",
    "        prompt,\n",
    "        max_output_tokens=256,\n",
    "        temperature=0.1,\n",
    "    ).text\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HShw52X2Dcmx"
   },
   "source": [
    "#### Few-shot prompting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tj_2hHAWE8vh"
   },
   "source": [
    "Let's say you want to a get a short answer from the model (like only a specific name). To do so, you can leverage a few-shot prompt and provide examples to the model to illustrate the expected behavior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "RE5yCAaqDg7m"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#BeachsideRetreat\n"
     ]
    }
   ],
   "source": [
    "prompt = \"\"\"Your real estate company aims to establish a strong social media presence and is eager to create a catchy hashtag based on long \n",
    "house descriptions in ads.\n",
    "\n",
    "\n",
    "input: Introducing 1234 Seaside Drive, a beautifully remodeled 2-story home with breathtaking ocean views from both levels. The upper level \n",
    "boasts an inviting open floor plan with vaulted beamed ceilings and a charming fireplace. The kitchen features a stunning oversized quartz \n",
    "island and top-of-the-line appliances. The luxurious primary suite offers ocean vistas, vaulted ceilings, and an en-suite bath with dual sinks\n",
    "and a glass-tiled walk-in shower. Downstairs, two generously sized bedrooms with ocean views and a stylish bathroom await. Step outside to the\n",
    "expansive patio, perfect for entertaining amidst a tasteful succulent garden. Situated in the sought-after Coastal Heights neighborhood, just \n",
    "minutes from shops, dining, beaches, and natural parks. Your dream coastal retreat awaits!\n",
    "\n",
    "\n",
    "output: #CoastalVistaRetreat\n",
    "\n",
    "\n",
    "input: Experience beach resort living in this carefree condo nestled on a hill with panoramic views of the Horizon Bay Resort. Immerse yourself\n",
    "in the serene sounds of waves at Sandy Cove and Sunset Beach, just a stone's throw away across Ocean Breeze Highway. This two-bedroom upper \n",
    "level condo offers an open floor plan, custom lighting, and a breathtaking ocean view from your private deck. Indulge in the remodeled kitchen \n",
    "with sleek stainless steel appliances and stylish granite countertops. The bathroom boasts impeccable quality, adding a touch of luxury to your \n",
    "coastal retreat. Both bedrooms feature ceiling fans and ample natural light, offering tranquil views of the coastal hills. Storage is a breeze \n",
    "with the shaded, over-sized carport area and large attic space with a pull-ladder. Relax at the resort-quality association pool, complete with \n",
    "lounge chairs, umbrellas, and two restrooms with showers, all while enjoying sweeping views over Bluewater Creek and the glistening Pacific.\n",
    "Enjoy an active lifestyle with nearby yoga in the park, golf, hiking trails, shops, and dining. Multiple luxury resorts are mere minutes away.\n",
    "Hop aboard the Seaside Express and Coastal Breeze trolleys, conveniently stopping at the base of the hill, offering easy access to summer \n",
    "concerts, festivals, and vibrant coastal attractions. Don't miss this chance to make your seaside dreams come true! Contact us now for a private\n",
    "showing.\n",
    "\n",
    "\n",
    "output: #SeasideEscape\n",
    "\n",
    "\n",
    "input: Coastal Elegance Awaits! Immerse yourself in beach living with this stunning 4-bedroom, 3-bathroom home. Recently renovated with over \n",
    "$200,000 in upgrades, this move-in ready haven offers the perfect blend of style and functionality. Step inside to discover a beautifully \n",
    "updated kitchen featuring quartz counters, stainless steel appliances, and a vent hood. The spacious dining area leads to a cozy formal living\n",
    "room with a wood-burning fireplace, creating a welcoming space for gatherings. All three bathrooms have been completely remodeled, showcasing \n",
    "new vanities, quartz counters, and elegant tubs/showers with glass enclosures. The master bedroom addition provides an expansive retreat with \n",
    "a walk-in closet, while a second master bedroom at the front offers versatility.This delightful home boasts bonus features like all-new LVT \n",
    "flooring, smooth ceilings, recessed lighting, dual-pane windows, and a Nest thermostat for modern comfort. Outside, the private rear yard is \n",
    "bordered by block wall fencing, providing a serene space to relax or entertain. A large side yard offers additional opportunities for outdoor \n",
    "enjoyment. Situated in a quiet neighborhood, the property is within walking distance to award-winning schools, a park, golf, shopping, and \n",
    "restaurants. Plus, you're only a 5-minute drive from the beach and the vibrant Lovely Mall. Don't miss out on this coastal gem with luxurious \n",
    "upgrades and a convenient location. Schedule a showing today and experience the beach lifestyle you've been dreaming of!\n",
    "\n",
    "\n",
    "output:\n",
    "\n",
    "\"\"\"\n",
    "\n",
    "print(\n",
    "    generation_model.predict(\n",
    "        prompt,\n",
    "        max_output_tokens=20,\n",
    "        temperature=0.1,\n",
    "    ).text\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xGvs0jFsUlvM"
   },
   "source": [
    "#### Zero-shot prompting vs Few-shot prompting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7yjsAMuMUfZC"
   },
   "source": [
    "Zero-shot prompting can be useful for quickly generating text for new tasks, but the quality of the generated text may be lower than that of a few-shot prompt with well-chosen examples. Few-shot prompting is typically better suited for tasks that require a high degree of specificity or domain-specific knowledge, but requires some additional thought and potentially data to set up the prompt."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u6UiJTxXEs4t"
   },
   "source": [
    "### Closed Domain"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "03ZITm4AGBvP"
   },
   "source": [
    "#### Adding internal knowledge as context in prompts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EkhqjmB6VqPx"
   },
   "source": [
    "Imagine a scenario where you would like to build a question-answering bot that takes in internal documentation and lets users ask questions about it.\n",
    "\n",
    "In the example below, the context is added to the prompt, so that the PaLM API can use that to answer subsequent questions with the provided context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "s1H2er_lExpW"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Prompt]\n",
      "Answer the question given in the context below:\n",
      "Context: \n",
      "Creating catchy hashtags for a real estate company involves understanding your target audience, being creative, and aligning with your brand \n",
      "identity. Here are some tips to help you create compelling and effective hashtags:\n",
      "Keep them short and sweet. Hashtags should be easy to remember and type, so keep them to 1-2 words or a short phrase.\n",
      "Use relevant keywords. When people are searching for real estate, they'll likely use relevant keywords in their search terms. Make sure to \n",
      "include these keywords in your hashtags so that your listings will show up in their search results. \n",
      "Be creative. Don't be afraid to get creative with your hashtags. Use puns, wordplay, or other creative techniques to make your hashtags stand \n",
      "out.\n",
      "Use trending hashtags. If there are any trending hashtags related to real estate, use them in your posts. This will help your listings get \n",
      "seen by more people.\n",
      "Use branded hashtags. Create your own branded hashtags that you can use across all of your social media channels. This will help people to \n",
      "identify your brand and remember your listings.\n",
      "?\n",
      "Question: What is the best way to create a catchy hashtag based on a real estate description ? \n",
      "Answer:\n",
      "\n",
      "[Response]\n",
      "Keep them short and sweet. Hashtags should be easy to remember and type, so keep them to 1-2 words or a short phrase.\n"
     ]
    }
   ],
   "source": [
    "context = \"\"\"\n",
    "Creating catchy hashtags for a real estate company involves understanding your target audience, being creative, and aligning with your brand \n",
    "identity. Here are some tips to help you create compelling and effective hashtags:\n",
    "Keep them short and sweet. Hashtags should be easy to remember and type, so keep them to 1-2 words or a short phrase.\n",
    "Use relevant keywords. When people are searching for real estate, they'll likely use relevant keywords in their search terms. Make sure to \n",
    "include these keywords in your hashtags so that your listings will show up in their search results. \n",
    "Be creative. Don't be afraid to get creative with your hashtags. Use puns, wordplay, or other creative techniques to make your hashtags stand \n",
    "out.\n",
    "Use trending hashtags. If there are any trending hashtags related to real estate, use them in your posts. This will help your listings get \n",
    "seen by more people.\n",
    "Use branded hashtags. Create your own branded hashtags that you can use across all of your social media channels. This will help people to \n",
    "identify your brand and remember your listings.\n",
    "\"\"\"\n",
    "\n",
    "question = \"What is the best way to create a catchy hashtag based on a real estate description ?\"\n",
    "\n",
    "prompt = f\"\"\"Answer the question given in the context below:\n",
    "Context: {context}?\n",
    "Question: {question} \n",
    "Answer:\n",
    "\"\"\"\n",
    "\n",
    "print(\"[Prompt]\")\n",
    "print(prompt)\n",
    "\n",
    "print(\"[Response]\")\n",
    "print(\n",
    "    generation_model.predict(\n",
    "        prompt,\n",
    "    ).text\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tagWC4VcQIw6"
   },
   "source": [
    "#### Instruction-tuning outputs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "v9UkogWHXM6N"
   },
   "source": [
    "Another way to help out language models is to provide additional instructions to frame the output in the prompt. To ensure the model doesn't respond to anything outside the context, the prompt can specify that the response should be \"Information not available in provided context\" if that's the case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "ouq8FfwSQIBT"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Prompt]\n",
      "Answer the question given the context below as {Context:}. \n",
      "\n",
      "If the answer is not available in the {Context:} and you are not confident about the output,\n",
      "please say \"Information not available in provided context\". \n",
      "\n",
      "\n",
      "Context: \n",
      "Creating catchy hashtags for a real estate company involves understanding your target audience, being creative, and aligning with your brand \n",
      "identity. Here are some tips to help you create compelling and effective hashtags:\n",
      "Keep them short and sweet. Hashtags should be easy to remember and type, so keep them to 1-2 words or a short phrase.\n",
      "Use relevant keywords. When people are searching for real estate, they'll likely use relevant keywords in their search terms. Make sure to \n",
      "include these keywords in your hashtags so that your listings will show up in their search results. \n",
      "Be creative. Don't be afraid to get creative with your hashtags. Use puns, wordplay, or other creative techniques to make your hashtags stand \n",
      "out.\n",
      "Use trending hashtags. If there are any trending hashtags related to real estate, use them in your posts. This will help your listings get \n",
      "seen by more people.\n",
      "Use branded hashtags. Create your own branded hashtags that you can use across all of your social media channels. This will help people to \n",
      "identify your brand and remember your listings.\n",
      "?\n",
      "\n",
      "Question: Tell me the current market price of a specific property in a particular city? \n",
      "\n",
      "Answer:\n",
      "\n",
      "[Response]\n",
      "Information not available in provided context\n"
     ]
    }
   ],
   "source": [
    "question = \"Tell me the current market price of a specific property in a particular city?\"\n",
    "prompt = f\"\"\"Answer the question given the context below as {{Context:}}. \\n\n",
    "If the answer is not available in the {{Context:}} and you are not confident about the output,\n",
    "please say \"Information not available in provided context\". \\n\\n\n",
    "Context: {context}?\\n\n",
    "Question: {question} \\n\n",
    "Answer:\n",
    "\"\"\"\n",
    "\n",
    "print(\"[Prompt]\")\n",
    "print(prompt)\n",
    "\n",
    "print(\"[Response]\")\n",
    "print(\n",
    "    generation_model.predict(\n",
    "        prompt,\n",
    "        max_output_tokens=256,\n",
    "        temperature=0.3,\n",
    "    ).text\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iZJfZShPRGqU"
   },
   "source": [
    "#### Few-shot prompting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "qdSEQeQIS6pt"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "With the rapid growth of social media platforms and their influence on the real estate industry, the use of housing hashtags has become a \n",
      "crucial marketing strategy for real estate agents and property sellers.\n"
     ]
    }
   ],
   "source": [
    "prompt = \"\"\"\n",
    "Context:\n",
    "As the housing market experiences fluctuations in demand, supply, and economic conditions, the concept of \"rent-to-own\" arrangements has \n",
    "gained popularity among both tenants and homeowners. Rent-to-own, also known as lease-to-own or rent-to-buy, is a housing arrangement that \n",
    "offers potential buyers an alternative path to homeownership, particularly for those who may face challenges in qualifying for traditional \n",
    "mortgages or are not ready for an immediate purchase.\n",
    "\n",
    "Question:\n",
    "What is a rent-to-own housing arrangement, and how does it work for both tenants and homeowners?\n",
    "\n",
    "Answer:\n",
    "A rent-to-own housing arrangement is a contract between a tenant and a homeowner where the tenant has the option to buy the property at a \n",
    "predetermined price after a specified rental period. It offers potential buyers a pathway to homeownership while providing homeowners with a \n",
    "secure rental income and a potential future sale.\n",
    "---\n",
    "\n",
    "Context:\n",
    "The concept of \"housing bubbles\" was first widely discussed in the early 2000s after a series of real estate market crashes in various \n",
    "countries. Since then, the phenomenon of housing bubbles has become a significant topic of interest in the housing market, with potential \n",
    "implications for both buyers and sellers.\n",
    "\n",
    "Question:\n",
    "When were housing bubbles first widely discussed?\n",
    "\n",
    "Answer:\n",
    "Housing bubbles were first widely discussed in the early 2000s after a series of real estate market crashes in various countries.\n",
    "\n",
    "---\n",
    "\n",
    "Context:\n",
    "With the rapid growth of social media platforms and their influence on the real estate industry, the use of housing hashtags has become a \n",
    "crucial marketing strategy for real estate agents and property sellers. Crafting the right hashtags can significantly impact the visibility \n",
    "and reach of property listings, helping to attract potential buyers and generate interest in the housing market.\n",
    "\n",
    "Question: Why are housing hashtags so crucial ?\n",
    "\n",
    "Answer:\n",
    "\n",
    "\"\"\"\n",
    "print(\n",
    "    generation_model.predict(\n",
    "        prompt,\n",
    "    ).text\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "94d80fb55f48"
   },
   "source": [
    "### Evaluation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b620d23a7634"
   },
   "source": [
    "You can evaluate the outputs of the question and answering task if the ground truth answers of each question are available. In zero-shot prompting, you can only use `open domain` questions. However, with `closed domain` questions, you can add context and evaluate similarly.  To showcase how that will work, start by creating a simple dataframe with questions and ground truth answers. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "8e813a463531"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>answer_groundtruth</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>What is a mortgage?</td>\n",
       "      <td>a loan secured by a mortgage</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>What is a condo?</td>\n",
       "      <td>a type of housing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>What is a duplex?</td>\n",
       "      <td>a building with two separate units</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              question                  answer_groundtruth\n",
       "0  What is a mortgage?        a loan secured by a mortgage\n",
       "1     What is a condo?                   a type of housing\n",
       "2    What is a duplex?  a building with two separate units"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qa_data = {\n",
    "    \"question\": [\n",
    "        \"What is a mortgage?\",\n",
    "        \"What is a condo?\",\n",
    "        \"What is a duplex?\",\n",
    "    ],\n",
    "    \"answer_groundtruth\": [\"a loan secured by a mortgage\", \"a type of housing\", \"a building with two separate units\"],\n",
    "}\n",
    "qa_data_df = pd.DataFrame(qa_data)\n",
    "qa_data_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "951a147dc79d"
   },
   "source": [
    "Now that you have the data with questions and ground truth answers, you can call the PaLM 2 generation model to each review row using the `apply` function. Each row will use the dynamic prompt to predict the answer using the PaLM API. We will save the results in `answer_prediction` column.  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "ffc47e0cb5b9"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>answer_groundtruth</th>\n",
       "      <th>answer_prediction</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>What is a mortgage?</td>\n",
       "      <td>a loan secured by a mortgage</td>\n",
       "      <td>a loan secured by a mortgage</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>What is a condo?</td>\n",
       "      <td>a type of housing</td>\n",
       "      <td>a type of housing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>What is a duplex?</td>\n",
       "      <td>a building with two separate units</td>\n",
       "      <td>a building with two separate apartments</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              question                  answer_groundtruth  \\\n",
       "0  What is a mortgage?        a loan secured by a mortgage   \n",
       "1     What is a condo?                   a type of housing   \n",
       "2    What is a duplex?  a building with two separate units   \n",
       "\n",
       "                         answer_prediction  \n",
       "0             a loan secured by a mortgage  \n",
       "1                        a type of housing  \n",
       "2  a building with two separate apartments  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def get_answer(row):\n",
    "    prompt = f\"\"\"Answer the following question as precise as possible.\\n\\n\n",
    "            question: {row}\n",
    "            answer:\n",
    "              \"\"\"\n",
    "    return generation_model.predict(\n",
    "        prompt=prompt,\n",
    "    ).text\n",
    "\n",
    "\n",
    "qa_data_df[\"answer_prediction\"] = qa_data_df[\"question\"].apply(get_answer)\n",
    "qa_data_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6fe997dbf788"
   },
   "source": [
    "You may want to evaluate the answers predicted by the PaLM API. However, it will be more complex than the text classification since the answers may differ from ground truth and may be presented in slightly more/fewer words. \n",
    "\n",
    "For example, you can observe the question \"What is the name of the Earth's largest ocean?\" and see that model predicted  \"Pacific Ocean\" when a ground truth label is \"The Pacific Ocean\" with the extra \"The.\" Now, if you use the simple classification metrics, then you will consider this as a wrong prediction since original and predicted strings have a difference. However, you can see that the answer is correct since an extra \"The\" is causing the issue. It's a simple string comparison problem.\n",
    "\n",
    "The solution to string comparison where both `ground_thruth` and `predicted` may have some extra or fewer letters, one approach is to use a fuzzy matching algorithm. \n",
    "Fuzzy string matching uses [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) to calculate the differences between two strings. \n",
    "\n",
    "For example, the Levenshtein distance between \"kitten\" and \"sitting\" is 3, since the following 3 edits change one into the other, and there is no way to do it with fewer than 3 edits:\n",
    "\n",
    "* kitten → sitten (substitution of \"s\" for \"k\"),\n",
    "* sitten → sittin (substitution of \"i\" for \"e\"),\n",
    "* sittin → sitting (insertion of \"g\" at the end).\n",
    "\n",
    "\n",
    "Here's another example, but this time using `fuzzywuzzy`  library, which gives us the same `Levenshtein distance` between two strings but in ratio. The ratio raw score measures the string's similarity as an int in the range [0, 100]. For two strings X and Y, the score is defined by int(round((2.0 * M / T) * 100)) where T is the total number of characters in both strings, and M is the number of matches in the two strings. \n",
    "\n",
    "Read more here about the [ratio formula](https://anhaidgroup.github.io/py_stringmatching/v0.3.x/Ratio.html) : \n",
    "\n",
    "You can see one example to understand this furhter. \n",
    "```\n",
    "String1: \"this is a test\"\n",
    "String2: \"this is a test!\"\n",
    "\n",
    "Fuzz Ratio => 97  #\n",
    "\n",
    "Fuzz Partial Ratio => 100  #Since most characters are the same and in a similar sequence, the algorithm calculates the partial ratio as 100 and ignores simple additions (new characters). \n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5f048152519f"
   },
   "source": [
    "Now compute a score to perform fuzzy matching:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "040c1f9a175b"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>answer_groundtruth</th>\n",
       "      <th>answer_prediction</th>\n",
       "      <th>match_score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>What is a mortgage?</td>\n",
       "      <td>a loan secured by a mortgage</td>\n",
       "      <td>a loan secured by a mortgage</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>What is a condo?</td>\n",
       "      <td>a type of housing</td>\n",
       "      <td>a type of housing</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>What is a duplex?</td>\n",
       "      <td>a building with two separate units</td>\n",
       "      <td>a building with two separate apartments</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              question                  answer_groundtruth  \\\n",
       "0  What is a mortgage?        a loan secured by a mortgage   \n",
       "1     What is a condo?                   a type of housing   \n",
       "2    What is a duplex?  a building with two separate units   \n",
       "\n",
       "                         answer_prediction  match_score  \n",
       "0             a loan secured by a mortgage          100  \n",
       "1                        a type of housing          100  \n",
       "2  a building with two separate apartments           88  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from fuzzywuzzy import fuzz\n",
    "\n",
    "\n",
    "def get_fuzzy_match(df):\n",
    "    return fuzz.partial_ratio(df[\"answer_groundtruth\"], df[\"answer_prediction\"])\n",
    "\n",
    "\n",
    "qa_data_df[\"match_score\"] = qa_data_df.apply(get_fuzzy_match, axis=1)\n",
    "qa_data_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "11e266c49860"
   },
   "source": [
    "Now that you have the individual match score (partial), you can take the mean or average of the whole column to get a sense of overall data. \n",
    "Scores closer to 100 mean PaLM 2 can predict closer to ground truth; if the score is towards 50 or 0, it did not perform well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "dae6a92a7650"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the average match score of all predicted answer from PaLM 2 is :  96.0  %\n"
     ]
    }
   ],
   "source": [
    "print(\n",
    "    \"the average match score of all predicted answer from PaLM 2 is : \",\n",
    "    qa_data_df[\"match_score\"].mean(),\n",
    "    \" %\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d9e78972cad1"
   },
   "source": [
    "In this case, you get 100% as the mean score, even though some predictions were missing some words. That means you are very close to the ground truth, and some answers are just missing the exact verboseness of the ground truth. "
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "question_answering.ipynb",
   "toc_visible": true
  },
  "environment": {
   "kernel": "python3",
   "name": "common-cpu.m110",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/base-cpu:m110"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
